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Abstract 


Background: Sample size is the backbone of a scientific study/research in any field of science. It is mandatory at time of preparing proposal 
for a particular study. Large sample size is not needed and it is unethical also. Similarly, very too smaller sample is also. 


Materials and Methods: The adequate sample would be calculated either manually by using some formulae or using some existing statistical 
software like Statistical Package for Social Sciences (SPSS), Epi Info or online software with some conditions related to that study/research. 


Results: Previously existing literatures are not available then; sample size is to be determined based on pilot study in that region. In this 
article discuss about the sample size, its calculations, determine sample size we have to follow the conditions, knowledge to be needed at the 
time of calculating sample size, how much statistical power fix to a study, sample size and ethics, type-I and type-II errors, interpreting results 
of larger and smaller samples. 


Conclusion: From this article, I have concluded that a researcher has to write the proposal for his/her studies, find the appropriate parent/key 
article, finding the risk factors related to your studies, and the final step is to find the sample size with the help of a statistical expert. Then 
only proceed the study by the researchers. Sample size calculation procedure is very much useful to the young researchers. 
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Introduction 


One of the most important aspects of planning in medical, clinical, epidemiology, or translational study is the calculation of sample size. In 
medical research sample size is an essential tool for all types of studies. [1] It is naturally feasible to study the whole population in any type 
of studies in medical research. All studies are usually conducted based on sample size with some inclusion and exclusion criteria. [2] Sample 
size is playing an important role in the analysis part and as well as in the results writing. [3] Why because, the conclusions are drawing from 
the analysis of sample size determined for a study and which is to be generalized to the whole/entire population of a part/re gion/state/country. 
This is only following by the National Sampling Survey and all Government organizations. Most of the Government related sample survey 
was very much useful in the policy making process/in decision taking in the nationwide. [4] The sample size should be adequate/enough in 
size then, the researcher can start or do the study. This is very important and mandatory also. Sample size is calculating by using manually 
by some established and appropriate formulae which is already exists in the literatures or documents. [5] 


Materials and Methods 


Calculation of Sample Size: [6] 
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Sample size is the subset of a study population. Sample size to calculate 
to show that under some conditions as per your inclusion and exclusion 
criteria. To fix an adequate sample size mainly to minimize money, man 
power, time to conduct the study, and to show at the time of proposal 
presentation of the sample size calculation to the Scientific Research 
Committee (SRC) and Institutional Ethical Committee (IEC) and for the 
funding agency that the study has had a reasonable chance to obtain a 
correct result. In statistics, sample size is the measure of the number of 
individual samples ad _ that shall be used in a 
study/experiment. Estimation of sample Size by the following three 
ways: (a). Formulae by manual calculations; (b). Sample size tables; (c). 
Software like Epi Info [7], nMaster 2.0 [8], OpenEpi 3.01 version [9] and 
online sample size calculator. Hence, by some conditions (inclusion and 
exclusion criteria) a set of participants will be selected from the 
population, which is less in number/size but adequately enough sample 
size represents the population from which it is drawn so that true 
inferences about the population can be made from the obtained results. 
This set of patients/individuals is known as “Sample or Sample Size” as 
show in Figure—1 


Figure — 1 Showing population and selection of a sample 


Population 


Sample Size, Ethical Considerations and its effects to 
the participants 


Sample size calculation is an essential procedure/process and mandatory 
in all the studies. If sample size is very high in a study, then the sampling 
error will reduce and get the accurate results for a particular study. Then, 
results will be the better representative of the study population. After 
certain points, adding more samples isn’t giving that much effect in the 
accuracy of the study. So, put the effort and expense to those recruited 
patients isn’t worth to a researcher. Furthermore, it will give trouble to 
the extra patients in the study. This is unethical. In another way also 
unethical things will happen. For example, suppose a researcher have 
taken smaller sample size than the calculated sample size ie. Excess 
patients to be included in their study. In this situation, those patients’ 
have to face physical and psychological disturbance at the time of face- 
face interviews, their physical check-up, blood sampling, routine check- 
up ad other procedures. These thigs to be avoided when the researcher 
conducted their research/study within the calculated sample size. 


Statistical Power [10-12] 


Statistical power means the ability of a study/experimental design and 
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hypothesis setup to detect a particular effect if it is truly present. In 
any medical research or any research, the statistical power always at 
least 80% and above only as show in Table-1. A researcher should 
know in what way to increase the statistical power of their study. In 
the following ways: Increase the potential effect of size by 
manipulating independent variable more strongly to the study, 
increase sample size to that study, increase thelevel of 
significance (a) and to reduce measurement error by increasing the 
precision and accuracy of your measurement equipment and test 
procedures of the study. 


Table — 1 Showing B and Z constant value by 
conversion according the power of the study 


Power 80% 85% 90% 95% 


Value 0.8416 1.0364 1.2816 1.6449 


Conditions follows at the time of determining the 
sample size for a study: [12] 


At the time of sample size calculation, a researcher has to remember 
the following conditions. 
1. What is the primary objective of the study? 
2. Whether the researcher choose an appropriate key 
article/parent article to his/her research topic? 
3. Is the parent/key article co-inside up to 50% or 60% or 
maximum 80% to their primary objective or not? 
4. What is the main outcome measure of the study? Whether it is 
a continuous or categorical outcome? 
5. How will the data be analyzed to detect a group difference ad 
mentioned in the statistical analysis part? 
6. How small a difference is clinically important to detect? 
7. How much variability is in our study population/group? 
8. What is the desired level of significance (a) and Type II error 
(B)? 
9. What is the anticipated/expected drop-out percentage and non- 
response percentage? 


Type I, Type II, and Level of Significant 


This type of knowledge one can gets from previous published or 
existing studies/articles, and as well as from some pilot studies. If 
information is lacking about the proposed study, then there is no 
good way to calculate the sample size. 

Type-I error means rejecting H, when H, is true, Type-II error 


means failing to reject H, when H, is false, level of significant (a) 


means type-I error rate, 8 denotes type-II error rate and statistical 
power (1—B) means probability of detecting group difference given 
the size of the effect (A) and the sample size of the trial (N) as shown 
in Table—2 and Table-3. 


Adequate Precision in the process of calculation of 
Sample Size: 


In descriptive study, summary statistics (mean, proportion), 
reliability (or) precision. By giving “Confidence Interval (CI)” 
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Table — 2 Showing disease status and test results 


Test Results 


Disease Status 


Present 


Absent 


Positive 


True Positive 
(Sensitivity) 


False Positive 


Negative 


False Negative 


True Negative 
(Specificity) 


Table — 3 Showing Type-I error and Type—II error 


Significant Difference is 
Test Result 
Pa Present Absent 
(Ho not true) (Ho is true) 
Bogie No error Type I error 
1-8 v7 
Negative Type 7 error N : ae 


wider the 95% C.I — sample statistic isn’t reliable and it may not give 
an accurate estimate of the true value of the population parameter. 


Sample size formulae for various 
situations 


When standard deviation value was known 
2 2 2 


n=Z, 8S /d 


Here, S = Standard Deviation 


Example: 


A study is to be conducted to determine the parameter Body Mass 
Index in a community. From a previous and recent published article, a 
Standard Deviation (0) of 46 was taken. Allowable/ Sample error (d) 
was taken as 4 and level of confidence was 99%. How many subjects 
should be included in this study? 


Zaz = 2.58, 6 = 46, Allowable Error (d) = 4 
22 2 


n=Z, S/d 
(2.58 x (46) 
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n= 880.3 ~ 881 (Calculated minimum sample size) 
When single proportion was given 
n = Zan *P* (1-P)/d? 
Where, Za = 1.96 for 95% confidence level 
Za =2.58 for 99% confidence level 


Example: 


To determine/estimate proportion of anemic among school going 
children. So, in the key article, the anemic proportion was 30%. 
The researcher wants to compute the minimum sample size 
required to his/her study at 95% confidence level and allowable 
error up to 4% of the study population. 


n = Zy27 *P* (1-P)/d? 


Za2=1.96; P=30%; d=accuracy of estimate/allowable error = 
4% 
(1.96)? x 0.3 (1 — 0.3) 


(0.4) 
n= 504.21 ~ 505 (Calculated minimum sample size) 


Three bits of information required to 
determine the sample size in clinical study 


Researcher fixes probabilities of type I and II errors 
Prob (type I error) = Prob (reject H when H is true) = a 
0 0 


Smaller error => greater precision => need more information > 
need larger sample size 
Prob (type II error) = Prob (Accept H when H is false) = B 

0 0 


Statistical Power = 1-8 
More power => smaller error > need larger sample size 


Quantities related to the research question (defined by the 
researcher), size of the measure of interest to be detected, difference 
between two or more means, difference between two or more 


proportions, odds ratio, relative risk, correlation, regression 
2 


coefficients, and change in R , etc. 
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The magnitude of these values depends on the research question and 
objective of the study (for example, clinical relevance). 


Clinical Effect Size 


Clinical effect size is a measure of (a). the amount of change in a 
sample of patients who undergo a treatment, (b). the amount of 
change in a sample of patients who undergo a treatment compared 
to acontrol group. It explains to the researcher how much difference 
in a treatment. It is truly an estimate and often the most challenging 
aspect of sample size planning only. 


Large difference will be happened by small sample size fixation. 


Similarly small differences will be happened large sample size and 
it would be a cost-effective and benefited to the community. 


Sample size formulae for comparing of 
two means 

n=2S? (Zat Zp) /d? 

Where S = SD; d = difference 


Za = 1.96 for 95% confidence level; Za =2.58 for 99% 
confidence level 


Zp = 0.842 for 80% power; Zp = 1.282 for 90% power 


Sample size formulae for comparing of 
two proportions 


az (7, +Z,) (py) +(7,4,)) 
(p,-P,)’ 


Here, Za = 1.96 for 95% confidence level; Za = 2.58 for 99% 
confidence level 


,where  =(1-p,),4, =(-p,) 


Zp= 0.842 for 80% power; Zp= 1.282 for 90% power 
Example: 


For example, does the consumption of large doses of vitamin A in 
tablet form prevent breast cancer? 


The data from the tumor-registry data that incidence rate of breast 


cancer over a 1-year period for women aged 45—49 years is 150 
cases per 100,000 women randomized to Vitamin A vs. placebo 


a (Z, +Z,) ((>,4,) + (p,9,)) 


where q, =(-).4, =- 
(p,-P,)° Ld h d Py) q P,) 


Test H): P, =P, vs. H, P,P, 
Assume 2-sided test with a = 0.05 and 80% power 
P= 150 per 100,000 = 0.0015 


p= 120 per 100,000 = 0.0012 (20% rate reduction) 
A=p, —p, = 9.0003 


Zz =196 z =.84 
Lp 


1-a/2 


n = 234,882 per group. This sample size is too large. 


Sample Size Formula to Compare Two 
Means from Independent Samples 


Null Hypothesis, H): u, = b, 


In this, a level, B level (1 — power), Expected population 
difference (A= Ih, - H), and Expected population standard 


deviation (o , o ) 
1 2 


Example: 


Research question: Does a special diet help to reduce cholesterol 
levels? 


Suppose a researcher wishes to determine sample size to detect a 10 
mg/dl difference in cholesterol level in a diet intervention group 
compared to a control (no diet) group. 


Subjects with baseline total cholesterol of at least 300 mg/dl 
randomized. 


Group 1: Six-week diet intervention 
Group 2: No changes in diet 


Investigator wants to compare total cholesterol at the end of 
the six-week study 


Statistical Analysis: Two sample t-test. This test for 
comparison of two means for independent samples 


HH =H vs. Hy #y 
i: 2 1 2 


Sample size calculation for continuous 
outcome when two independent samples 
are given 


Test Hp: H =H, vs. A H # H 


Two-sided alternative and assume outcome normally distributed 
with S = Standard Deviation; d=difference between two means. 
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Fu 


per/ group 


S= standard deviation; d=difference between two means; Za = 1.96 for 
95% confidence level; Zp= 1.282 for 90% power. 


Example 


Research Question: Does a special diet help to 
reduce cholesterol levels? 


TestH: p= -H :ps 
= oF, ie ave, H, 


Assume 2-sided test with a=0.05 and 90% power 
d=p -p =10 mg/dl 
1 2 


o =o =(50 mg/dl) 
i 
z =1.96 z =1.28 
a B 
Sample Size = n per group = 525 


Suppose 10% loss to follow-up expected, 
adjust n = 525/0.9 


Calculated miimum sample size = 584 per group 


Define research question well, consider study design, type of response 
variable, and type of data analysis, decide on the type of difference or 
change you want to detect (make sure it answers your research 
question), choose a, B and use appropriate equation for sample size 
calculation or sample size tables or software level 


Pragmatic approach to decision taking on 
sample size 


A researcher has to remember the following steps: 


Researcher has to remember that there is no standard answer. 
Initiate early discussion among research team members. 

Use correct assumptions to consider various possibilities. 
Consider other factors like, availability of cases, cost, time. 
Make a balanced choice 

Ask if this number gives you a reasonable prospect of coming to 
useful conclusion. 

7. If yes, proceed. If not, reformulate your problem for the 
particular study. 


ad ial alae 


Sample Size calculation and how to write 
the Sample Size calculation for the medical 
research? 


Cross-Sectional Study 


In cross-sectional studies to estimate the prevalence of unknown 
parameter(s) is the mai objective of the study population using a 
random sampling method. Adequate sample size is needed to 
estimate the population prevalence with a good allowable error. 


To calculate this adequate sample size would be used for calculating 
the adequate sample size for a prevalence study by the following 
simple formula. 


Z2 *P*(1-P) 


akg 

Here, n = sample size, Z = the statistic corresponding to level of 
confidence, P = expected prevalence (which can be obtained from 
previous existing similar type of studies (or) a pilot study conducted 
by the researchers, and d = precision (which is corresponding to 
effect size). Always, level of confidence aimed as 95%, most 
researchers express their results with 95% confidence interval. 
However, in clinical studies some researchers want to be more 
confident can chose a 99% confidence interval. 


Choose a key/parent article which will be very close to your research 
question from the existing literature which has been published within 
5 years of time period. 


Example: 


As per previous published literature, a study was done by Singh HV 
et al. (2021) [13] on “Prevalence of diabetic retinopathy in self- 
reported diabetics among various ethnic groups and associated risk 
factors in North-East India: A hospital-based study” determined the 
prevalence of retinopathy was 44.93%, with 80% of statistical power, 
95% confidence level and 10% allowable error, by using formula. 


n= 7a “pq [a 


the calculated minimum sample size (N) = (4 * 44.93 * 55.07)/20.187 
= 490.27 ~ 491 DM patients. But we want to take/round the total 
sample size as 500 DM patients to our present study. 


Sample size calculation in case 
control Study 


Case-control is a type of epidemiological observational study. It is 
often used to identify risk factors that may associated to a disease by 
comparing the risk factors in subjects who have that disease is called 
as “Cases” and with subjects who don’t have the particular disease 
which is called “Controls”. 


Sample size calculation for unmatched case control studies needs the 
following assumptions like; the assumed number of cases and 
controls who experienced the risk factors from similar studies or 
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from a pilot study and the researchers can use the assumed odds 
ratio, odds ratio, the level of confidence 95% and the proposed 
power of the study at least 80%. Using by some software/online 
software or reputed books that provide sample size to the 
researchers/investigators with the appropriate formula. But 
researchers should remember that, in the presence of a significant 
confounding factor, researchers required the minimum sample size. 
Since the confounding variables must be controlled for in any 
analysis, amore complex statistical model must be made, that’s why 
a larger sample is required to achieve significance. 


Formula: 


(r+ 1) (p) (1 —p) Zp + Zan) 
Sample Size = ------------------------------------- 


r (pi — p2) 
Here, 
r = Ratio of control to cases, 1 for equal number of cases and 
controls 


p = Average proportion exposed = (proportion of exposed cases + 
proportion of cotrol exposed)/2 

Zp = Standard normal variate for power = for 80% power it is 0.84 
ad for 90% value is 1.28. Researcher has to select power for the 
study. 

Zw2 = Standard normal variate for level of significance as mentioned 
in previous section 

pi — p2 = Effect size (or) different in proportion expected based on 
previous studies. pi = proportion in cases ad p2 = proportion in 
Control. 


Sample size formula-based on standard 
deviation in case control Study 


(r+ 1) (SD?) (Zp + Zar)” 
Sample Size = ------------------------------------- 


SD = Standard Deviation = Researcher can take value from 
previously published studies 

d = Expected mean difference between case ad control (the value 
based on previously existing literature/studies) 

r — value from the previous studies. 

Zp = Standard normal variate for power = for 80% power it is 0.84 
ad for 90% value is 1.28. Researcher has to select power for the 
study. 

Zw2 = Standard normal variate for level of significance as mentioned 
in previous section 


Sample size calculation for Cohort 


Study: 
[ Za V (1 + 1/w) * (p*) (1 — p*) (ZB * V pl) 
*(1—pl)/m+ p2 * (1 —p2) 
Sample Size = ------------------------------------------------------------ 
(pl — p2)° 
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Here, 


Za = Standard normal variate for Level of Significance 

w = Number of control subject per experimental subject 

Zp = Standard normal variate for power (or) Type 2 error as explained 
in earlier section 

p1 = Probability of events in Control group 

p2 = Probability of events in Experimental group 


Here, p2+ pi 
* 


Sample size calculation in Clinical Trials 


Sample size is too small in clinical trials and it would well conducted 
study may fail to answer to the research hypothesis. Moreover, it 
may fail to find the important effects and relationship. Minimum 
information needed to calculate the sample size for a randomized 
controlled trial includes statistical power, level of significance, 
underlying event rate in the population and size of the treatment 
effect sought. Otherwise, calculated sample size should be adjusted 
for other factors including expected compliance rates and, less 
commonly, an unequal allocation ratio. Based on some 
recommendations for different phases of clinical trials based on their 
sample size. In phase-I trial that involve drug safety on human 
participants/volunteers. Initial trials might require a total of around 
20 to 80 patients. In phase-II trials that investigate the treatment 
effects, seldom require more than 100 to 200 patients. 


Formula: 


(28D) (Za + Zp) 


Sample Size = ------------------------- 


Here, 


SD = Standard Deviation = Researcher can take value from 
previously published studies 

d = Effect Size = Difference between mean values (the value based 
on previously existing literature/studies, ie., Key Article) 

r — value from the previous studies. 

Zp = Standard normal variate for power = for 80% power it is 0.84 
ad for 90% value is 1.28. Researcher has to select power for the 
study. 

Zw2 = 1.96 (level of significance at 5%) from Z — table 


Free software to calculate sample size and its links as shown in Table 
-4. 

Sample Size Calculation in Aimal 
studies: [14] 

There are two methods of calculation of sample size in animal 


studies. The most preferred method is the same method which has 
been mentioned in sample size calculation for testing the hypothesis. 
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Table — 4 Free software to calculate sample size and its links 


SI. No. Name of the Software Links 
1. Ope Epi https://www.openepi.com/SampleSize/SSPropor.htm 
2, Epi Tools Epidemiological Calculators | https://epitools.ausvet.com.au/samplesize 
3. bioMath http://biomath.info/power/ 
4. ClinCalc https://clincalc.com/stats/samplesize.aspx 
5. Stat Pages https://statpages.info/ 
6. Free Statistics https://www.wessa.net/sample.wasp 
dhs Free Statistical Calculator https://www.danielsoper.com/statcalc/default.aspx 
8. G Power Software https://stats.oarc.ucla.edu/other/gpower/ 
9 UCLA: Statistical Methods and Data https://stats.oarc.ucla.edu/other/gpower/one-way-anova-power- 
, Aalytics analysis/ 


Statistical Considerations for clinical 


10. trials and scientific experiments https://hedwig.mgh.harvard.edu/sample_size/size.html 

11. Social Science Statistics https://www.socscistatistics.com/tests/ 

12. Datatab https://datatab.net/statistics-calculator/descriptive-statistics 
13. MedCalc https://www.medcalc.org/calc/ 

14. Select Statistical Services https://select-statistics.co.uk/calculators/ 


So much of efforts is needed to calculate the sample size. It is not suitable for all the situation. At that time, sample size calculation by power 
analysis like standard deviation, effect size, and others. In that condition a second method can be used this is called as “resource equation 
method”. In this method the value ‘E’ is calculated based on decided sample size. If ‘E’ should within 10—20 then that is the correct sample 
size. If value of ‘E’ is <10 then more animal should be included and if it is >20 then sample size should be decreased. 


Formula: The value of E as calculated as follows = (Total no. of animals) — (Total no. of groups) 

Suppose in an animal study a researcher formed 4 groups of animals having 8 animals each for different interventions then total animals, 
= 32 (4x8). 

Hence, E value, E= 32—4=28 

This is >20 hence animals should be decreased in each group. So, if researcher takes 5 rats in each group, then E value will be, 
E=20-4=16 


E is 16 which lies within 10-20 hence five rats per group for four groups. It will be considered as sample size (appropriately). This is a crude 
method and should be used only if sample size calculation can’t be done by power analysis method explained in above. 
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Conclusion 


From this article, I have concluded that a researcher has to write the 
proposal for his/her studies, find the appropriate parent/key article, 
finding the risk factors related to your studies, and the final step is 
to find the sample size with the help of a statistical expert. Then 
only proceed the study by the researchers. 


Limitations in the calculating sample size 
are as follows: 


1. Sample size calculated using the above formula is based on some 
modifications in Type-I and Type-II errors and few assumptions 
in effect size and standard variation. 


2. Enough/adequate and representative sample size always has to 
be calculated before initiating any study/research/survey and as 
far as possible should not be changed during the study course. 


3. Sample size calculation is also then influenced by a few practical 
issues will occur due to administrative issues, and costs. 
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